Monte Carlo Matrix Inversion Policy Evaluation
نویسندگان
چکیده
In 1950, Forsythe and Leibler (1950) introduced a statistical technique for finding the inverse of a matrix by characterizing the elements of the matrix inverse as expected values of a sequence of random walks. Barto and Duff (1994) subsequently showed relations between this technique and standard dynamic programming and temporal differencing methods. The advantage of the Monte Carlo matrix inversion (MCMI) approach is that it scales better with respect to state-space size than alternative techniques. In this paper, we introduce an algorithm for performing reinforcement learning policy evaluation using MCMI. We demonstrate that MCMI possesses accuracy similar to a maximum likelihood model-based policy evaluation approach but avoids ML’s slow execution time. In fact, we show that MCMI executes at a similar runtime to temporal differencing (TD). We then illustrate a least-squares generalization technique for scaling up MCMI to large state spaces. We compare this least-squares Monte Carlo matrix inversion (LS-MCMI) technique to the least-squares temporal differencing (LSTD) approach introduced by Bradtke and Barto (1996) demonstrating that both LS-MCMI and LSTD have similar runtime.
منابع مشابه
Error Bounds in Reinforcement Learning Policy Evaluation
With the advent of Kearns & Singh’s (2000) rigorous upper bound on the error of temporal difference estimators, we derive the first rigorous error bound for the maximum likelihood policy evaluation method as well as deriving a Monte Carlo matrix inversion policy evaluation error bound. We provide, the first direct comparison between the error bounds of the maximum likelihood (ML), Monte Carlo m...
متن کاملMonte Carlo Matrix Inversion and Reinforcement Learning
We describe the relationship between certain reinforcement learning (RL) methods based on dynamic programming (DP) and a class of unorthodox Monte Carlo methods for solving systems of linear equations proposed in the 1950's. These methods recast the solution of the linear system as the expected value of a statistic suitably defined over sample paths of a Markov chain. The significance of our ob...
متن کاملA Monte Carlo algorithm for efficient large matrix inversion
This paper introduces a new Monte Carlo algorithm to invert large matrices. It is based on simultaneous coupled draws from two random vectors whose covariance is the required inverse. It can be considered a generalization of a previously reported algorithm for hermitian matrices inversion based in only one draw. The use of two draws allows the inversion on non-hermitian matrices. Both the condi...
متن کاملPropagation of Errors for Matrix Inversion
A formula is given for the propagation of errors during matrix inversion. An explicit calculation for a 2 × 2 matrix using both the formula and a Monte Carlo calculation are compared. A prescription is given to determine when a matrix with uncertain elements is sufficiently nonsingular for the calculation of the covariances of the inverted matrix elements to be reliable.
متن کاملLoss of Load Expectation Assessment in Deregulated Power Systems Using Monte Carlo Simulation and Intelligent Systems
Deregulation policy has caused some changes in the concepts of power systems reliability assessment and enhancement. In this paper, generation reliability is considered, and a method for its assessment using intelligent systems is proposed. Also, because of power market and generators’ forced outages stochastic behavior, Monte Carlo Simulation is used for reliability evaluation. Generation r...
متن کامل